Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty
نویسندگان
چکیده
Clustering analysis is widely used in many fields. Traditionally clustering is regarded as unsupervised learning for its lack of a class label or a quantitative response variable, which in contrast is present in supervised learning such as classification and regression. Here we formulate clustering as penalized regression with grouping pursuit. In addition to the novel use of a non-convex group penalty and its associated unique operating characteristics in the proposed clustering method, a main advantage of this formulation is its allowing borrowing some well established results in classification and regression, such as model selection criteria to select the number of clusters, a difficult problem in clustering analysis. In particular, we propose using the generalized cross-validation (GCV) based on generalized degrees of freedom (GDF) to select the number of clusters. We use a few simple numerical examples to compare our proposed method with some existing approaches, demonstrating our method's promising performance.
منابع مشابه
PeakSegJoint: fast supervised peak detection via joint segmentation of multiple count data samples
Joint peak detection is a central problem when comparing samples in genomic data analysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegJoint, a new constrained maximum likelihood segmentation model for any number of sample types. To select the number of peaks in the segmentation, we propose a supervised penalty learning model. T...
متن کاملForging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach
In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with...
متن کاملUnsupervised Learning of Predictors from Unpaired Input-Output Samples
Unsupervised learning is the most challenging problem in machine learning and especially in deep learning. Among many scenarios, we study an unsupervised learning problem of high economic value — learning to predict without costly pairing of input data and corresponding labels. Part of the difficulty in this problem is a lack of solid evaluation measures. In this paper, we take a practical appr...
متن کاملMultitask kernel-based learning with first-order logic constraints
In this paper we propose a general framework to integrate supervised and unsupervised examples with background knowledge expressed by a collection of first-order logic clauses into kernel machines. In particular, we consider a multi-task learning scheme where multiple predicates defined on a set of objects are to be jointly learned from examples, enforcing a set of FOL constraints on the admiss...
متن کاملMeta-Unsupervised-Learning: A supervised approach to unsupervised learning
We introduce a new paradigm to investigate unsupervised learning, reducing unsupervised learning to supervised learning. Specifically, we mitigate the subjectivity in unsupervised decision-making by leveraging knowledge acquired from prior, possibly heterogeneous, supervised learning tasks. We demonstrate the versatility of our framework via comprehensive expositions and detailed experiments on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of machine learning research : JMLR
دوره 14 7 شماره
صفحات -
تاریخ انتشار 2013